RNA Structural Alignment with Conditional Random Fields

نویسندگان

  • Kengo Sato
  • Yasubumi Sakakibara
چکیده

Computationally identifying non-coding RNA regions on the genome has much attention to be investigated. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have a strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. [5] has proposed one of such algorithms, called pair hidden Markov models on tree structures (PHMMTSs), which can calculate a structural alignment of a binary tree and a sequence, and has applied PHMMTSs to aligning RNA secondary structures, that is, pairwise alignment to align an unfolded RNA sequence into an RNA sequence of known secondary structure. To calculate structural alignments of RNA sequences requires some parameters, the substitution probability of base pairs and the state transition probability, which have much affect on the performance of structural alignments. There are some related works to estimating the parameters for aligning RNA secondary structures. For example, [3] have proposed a ribosomal RNA substitution matrix, called RIBOSUM, which is based on an analogous method to the BLOSUM matrices. However, since the RIBOSUM matrix is based on the maximum likelihood estimation by relative frequencies of RNA mutations, it requires a large number of high-quality structure-annotated alignments to avoid overfitting. Therefore, more effective methods for estimating the parameters for RNA structural alignment should be developed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hidden Conditional Random Fields with M-to-N Alignments for Grapheme-to-Phoneme Conversion

Conditional Random Fields have been successfully applied to a number of NLP tasks like concept tagging, named entity tagging, or grapheme-to-phoneme conversion. When no alignment between source and target side is provided with the training data, it is challenging to build a CRF system with state-of-the-art performance. In this work, we present an approach incorporating an Mto-N alignment as a h...

متن کامل

Conditional Random Fields for Modeling Protein Families with Structural Information

A statistical model of protein families, called profile conditional random fields (CRF), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate many-body correlations in the sequences to be al...

متن کامل

RNA secondary structure prediction using conditional random fields model

Non-coding RNAs (ncRNAs) have important biological functions in living cells dependent on their conserved secondary structures. Here, we focus on computational RNA secondary structure prediction by exploring primary sequences and complementary base pair interactions using the Conditional Random Fields (CRFs) model, which treats RNA prediction as a sequence labelling problem. Proposing suitable ...

متن کامل

Conditional Random Fields for Airborne Lidar Point Cloud Classification in Urban Area

Over the past decades, urban growth has been known as a worldwide phenomenon that includes widening process and expanding pattern. While the cities are changing rapidly, their quantitative analysis as well as decision making in urban planning can benefit from two-dimensional (2D) and three-dimensional (3D) digital models. The recent developments in imaging and non-imaging sensor technologies, s...

متن کامل

Unsupervised Alignment for Segmental-based Language Understanding

Recent years’ most efficient approaches for language understanding are statistical. These approaches benefit from a segmental semantic annotation of corpora. To reduce the production cost of such corpora, this paper proposes a method that is able to match first identified concepts with word sequences in an unsupervised way. This method based on automatic alignment is used by an understanding sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005